Conversation
|
PPC64 assembly code generated with PR: |
fcf8f3e to
b606231
Compare
|
retest this please |
|
Initial benchmarks on an NXP T2080 (e6500) core with 1.8GHz core clock: With PR 9852: With master: |
Oh I did not try with |
|
-O3, AES GCM Table, SHA256 C Master: PR 9852 with WOLFSSL_PPC64_ASM WOLFSSL_PPC64_ASM_INLINE WOLFSSL_PPC64_ASM_SMALL WOLFSSL_PPC64_ASM_AES_NO_HARDEN WOLFSSL_PPC32_ASM WOLFSSL_PPC32_ASM_INLINE WOLFSSL_PPC32_ASM_SMALL PR 9852 with WOLFSSL_PPC64_ASM WOLFSSL_PPC64_ASM_INLINE WOLFSSL_PPC64_ASM_AES_NO_HARDEN WOLFSSL_PPC64_ASM_AES_NO_HARDEN WOLFSSL_PPC32_ASM WOLFSSL_PPC32_ASM_INLINE |
dgarske
left a comment
There was a problem hiding this comment.
Benchmarks posted. Marking approved, but won't consider merge until you have a chance to evaluate results. I will also work on running on an e5500 core.
|
Excellent — PPC64 ASM AES is something we have been wanting. We use TLS extensively for our RustChain blockchain attestation nodes and Ergo anchor transactions. Available for testing:
Would be happy to benchmark AES-GCM and AES-CTR throughput on POWER8 before and after this PR. Let us know if test results from real hardware would be useful for review. |
|
Hi David, Please run the performance numbers with the latest version of the code. Thanks! |
|
Hi @Scottcjn, I have implemented AES-ECB/CBC/CTR/GCM. Thanks, |
|
retest this please |
|
retest this please |
I needed this patch: Here are the results on an NXP T1040 e5500 at 1.4GHz running Linux Symmetric Ciphers (MiB/s)
|
POWER8 S824 AES Benchmark ResultsHardware: IBM Power System S824 (8286-42A) — Dual 8-core POWER8, 512GB RAM, Ubuntu 20.04
Observations
AnalysisThis PR uses scalar T-table AES with GPR instructions rather than the hardware We verified that The GCM decryption improvement suggests the baseline C path had a performance issue there that the ASM correctly addresses. Happy to run additional configurations or ECB benchmarks ( Tested on real iron — Elyan Labs POWER8 infrastructure. |
Update: POWER8 hardware AES implementation submittedFollowing up on my benchmark results above — I've submitted PR #9932 which uses POWER8's hardware AES crypto instructions ( Quick comparison (AES-128 on POWER8 S824):
The key insight: POWER8 (ISA 2.07, 2013) introduced The hardware crypto approach is also inherently side-channel resistant (no data-dependent memory accesses), so no cache-line preloading is needed. I also found a GMAC correctness bug: Happy to collaborate on merging the approaches — your key expansion and GCM GHASH table work could complement the hardware crypto path nicely. @SparkiDev |
|
Hi Sean, Built and benchmarked your latest code on our IBM POWER8 S824 (dual 8-core POWER8, 512 GB RAM, Ubuntu 20.04, GCC 9.4). Build NotesCompiled with: Important: The assembly uses Benchmark Results (POWER8 S824,
|
| Mode | This PR (T-table ASM) | PR #9932 (vcipher HW) | Speedup |
|---|---|---|---|
| AES-128-CBC-enc | 95 MiB/s | 960 MiB/s | 10.1x |
| AES-128-CBC-dec | 191 MiB/s | 5,550 MiB/s | 29.1x |
| AES-128-CTR | 94 MiB/s | 5,217 MiB/s | 55.5x |
| AES-256-CTR | 67 MiB/s | 3,866 MiB/s | 57.7x |
| AES-128-ECB | — | 5,819 MiB/s | — |
The POWER8 ISA 2.07 vcipher/vcipherlast instructions execute AES rounds in the vector crypto unit — single-cycle throughput with 7-cycle latency, which an 8-way interleaved pipeline fills completely. The hardware path also eliminates side-channel risk from T-table lookups.
Happy to run any additional tests or configurations. Would be great to see the T-table approach used as a fallback for pre-POWER8 chips (e6500, etc.) with the hardware crypto path for POWER8+.
— Scott
Lagniappe: vec_perm AES on Power Mac G4 (no hardware crypto needed)A little something extra — we ran a pure AltiVec vec_perm AES implementation on a 2002 Power Mac G4 Dual (7450 @ 1.25 GHz, Mac OS X Tiger 10.4, GCC 4.0.1). This uses G4 Results (NIST FIPS-197 test vector verified ✅)POWER8 S824 Results (same code, no vcipher used)Why this mattersThe
This is the unoptimized "half-table" method (16 vec_perm passes per SubBytes). The Hamburg algebraic decomposition (GF(2^4) tower field via vec_perm) would reduce this to ~6 vec_perm ops, roughly 2.5x faster. TechniqueCode (standalone, ~250 lines, compiles with
|
|
Hi Sean, Thank you — that means a lot coming from the wolfSSL team. I'll reach out to support for the contributor agreement right away. Happy to continue testing on our POWER8 S824 and vintage PowerPC hardware as needed. We do a lot of work with hardware-level crypto and SIMD optimization at Elyan Labs and would welcome the opportunity to contribute further to wolfSSL's PowerPC support down the road. Looking forward to getting the CLA sorted and this merged. Best, |
c9e119d to
960449d
Compare
|
Added XTS. |
960449d to
1c753bd
Compare
To turn on assembly: --enable-ppc64-asm To build C code: --enable-ppc64-asm=inline To disable hardening (when physical access to device is not possible): WOLFSSL_PPC64_ASM_AES_NO_HARDEN AES-GCM works with either 4-bit (default) or table: --enable-aesgcm=table Using 'table' is faster for encryption/decryption.
1c753bd to
a0fc364
Compare
Description
To turn on assembly:
--enable-ppc64-asm
To build C code:
--enable-ppc64-asm=inline
To disable hardening (when physical access to device is not possible):
WOLFSSL_PPC64_ASM_AES_NO_HARDEN
AES-GCM works with either 4-bit (default) or table:
--enable-aesgcm=table
Using 'table' is faster for encryption/decryption.
Testing
./configure --disable-shared LDFLAGS=--static --host=powerpc64 CC=powerpc64-linux-gnu-gcc --enable-aesecb --enable-aescbc --enable-aesgcm=table --enable-aesctr CFLAGS=-DWOLFSSL_PPC64_ASM_AES_NO_HARDEN --enable-ppc64-asm
./configure --disable-shared LDFLAGS=--static --host=powerpc64 CC=powerpc64-linux-gnu-gcc --enable-aesecb --enable-aescbc --enable-aesgcm=table --enable-aesctr CFLAGS=-DWOLFSSL_PPC64_ASM_AES_NO_HARDEN --enable-ppc64-asm=inline
./configure --disable-shared LDFLAGS=--static --host=powerpc64 CC=powerpc64-linux-gnu-gcc --enable-aesecb --enable-aescbc --enable-aesgcm=table --enable-aesctr --enable-ppc64-asm
./configure --disable-shared LDFLAGS=--static --host=powerpc64 CC=powerpc64-linux-gnu-gcc --enable-aesecb --enable-aescbc --enable-aesgcm=table --enable-aesctr --enable-ppc64-asm=inline
./configure --disable-shared LDFLAGS=--static --host=powerpc64 CC=powerpc64-linux-gnu-gcc --enable-aesecb --enable-aescbc --enable-aesgcm=table --enable-aesctr